Sample Complexity and Performance Bounds for Non-Parametric Approximate Linear Programming

نویسندگان

  • Jason Pazis
  • Ronald Parr
چکیده

One of the most difficult tasks in value function approximation for Markov Decision Processes is finding an approximation architecture that is expressive enough to capture the important structure in the value function, while at the same time not overfitting the training samples. Recent results in nonparametric approximate linear programming (NP-ALP), have demonstrated that this can be done effectively using nothing more than a smoothness assumption on the value function. In this paper we extend these results to the case where samples come from real world transitions instead of the full Bellman equation, adding robustness to noise. In addition, we provide the first max-norm, finite sample performance guarantees for any form of ALP. NP-ALP is amenable to problems with large (multidimensional) or even infinite (continuous) action spaces, and does not require a model to select actions using the resulting approximate solution.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Providing a Method for Solving Interval Linear Multi-Objective Problems Based on the Goal Programming Approach

Most research has focused on multi-objective issues in its definitive form, with decision-making coefficients and variables assumed to be objective and constraint functions. In fact, due to inaccurate and ambiguous information, it is difficult to accurately identify the values of the coefficients and variables. Interval arithmetic is appropriate for describing and solving uncertainty and inaccu...

متن کامل

A Non-linear Integer Bi-level Programming Model for Competitive Facility Location of Distribution Centers

The facility location problem is a strategic decision-making for a supply chain, which determines the profitability and sustainability of its components. This paper deals with a scenario where two supply chains, consisting of a producer, a number of distribution centers and several retailers provided with similar products, compete to maintain their market shares by opening new distribution cent...

متن کامل

Comparison of Gene Expression Programming (GEP) and Parametric and Non-parametric Regression Methods in the Prediction of the Mean Daily Discharge of Karun River (A case Study: Mollasani Hydrometric Station)

Nowadays, the prediction of river discharge is one of the important issues in hydrology and water resources; the results of daily river discharge pattern could be used in the management of water resources and hydraulic structures and flood prediction. In this research, Gene Expression Programming (GEP), parametric Linear Regression (LR), parametric Nonlinear Regression (NLR) and non-parametric ...

متن کامل

Non-parametric Approximate Dynamic Programming via the Kernel Method

This paper presents a novel non-parametric approximate dynamic programming (ADP) algorithm that enjoys graceful approximation and sample complexity guarantees. In particular, we establish both theoretically and computationally that our proposal can serve as a viable alternative to state-of-the-art parametric ADP algorithms, freeing the designer from carefully specifying an approximation archite...

متن کامل

On the optimization of Dombi non-linear programming

Dombi family of t-norms includes a parametric family of continuous strict t-norms, whose members are increasing functions of the parameter. This family of t-norms covers the whole spectrum of t-norms when the parameter is changed from zero to infinity. In this paper, we study a nonlinear optimization problem in which the constraints are defined as fuzzy relational equations (FRE) with the Dombi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013